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REPORT 

The  main  activity  of  my  research  group  is  to  build  and  develop  the  probabilistic  pipeline.  When  solving 
problems  with  data,  we  take  the  following  steps. 

1.  We  make  assumptions  about  our  data,  embedding  it  in  a  probability  model  containing  hidden  and 
observed  random  variables. 

2.  Given  observations,  we  use  inference  algorithms  to  estimate  the  conditional  distribution  of  the  hidden 
variables.  This  is  the  central  statistical  and  computational  problem. 

3.  With  the  results  of  inference,  we  use  our  model  to  form  predictions  about  the  future,  explore  the  data, 
or  otherwise  apply  what  we  learned  to  solve  a  problem. 

4.  We  criticize  our  model,  understand  where  it  went  right  and  wrong,  and  repeat  the  process  to  revise  it. 

The  pipeline  cleanly  divides  the  essential  activities  of  data  analysis  and  facilitates  collaborative  solutions 
to  data  science  problems.  Building  models  and  using  them  arc  activities  that  require  domain  experts:  They 
tell  us  what  kinds  of  assumptions  they  want  to  make,  and  how  they  want  to  use  the  results  of  what  we  might 
discover  from  then-  data.  Inference  is  a  computational  and  statistics  problem.  Given  the  assumptions  and 
data,  the  problem  of  estimating  the  conditional  distribution  is  a  well-defined  mathematical  problem.  Model 
checking  and  application  again  requires  the  domain  expert,  who  can  identify  what  to  expect  and  which  areas 
of  the  problem  arc  important  to  success. 

For  this  project,  we  developed  many  aspects  of  this  pipeline,  particularly  around  scalable  online  learning, 
model  checking,  and  recommendation  systems.  More  broadly,  we  worked  on  computational  algorithms  for 
fitting  models  (scalable  learning),  algorithms  for  aiding  domain  experts  to  build  models  (model  checking), 
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and  real-world  applications  to  test  our  ideas  (recommendation).  We  went  beyond  the  scope  of  the  proposal  in 
several  ways,  exploring  applications  as  diverse  as  neuroscience,  sociology,  and  genetics. 


All  of  our  research  results  are  listed  at  the  end  of  this  report.  I  will  highlight  several  publications  of  note. 

The  first  is  “Stochastic  Variational  Inference”  (JMLR,  2013);  this  paper  scaled  up  modern  Bayesian  com¬ 
putation,  allowing  us  to  fit  many  complex  models  to  massive  data.  In  one  way,  it  is  the  culmination  of  this 
project. 

The  second  is  “Black  Box  Variational  Inference”  (AISTATS,  2014).  While  stochastic  variational  inference 
scaled  Bayesian  computation  up  to  massive  data,  black  box  variational  inference  expands  the  scope  of  scalable 
Bayesian  computation  to  models  that  were  previously  too  difficult  to  work  with. 

Both  of  these  algorithms,  in  retrospect,  have  had  a  significant  impact.  They  are  widely  cited  and  widely 
implemented  in  open-source  software  packages.  Many  of  our  other  publications  for  this  project  adapted  these 
ideas  including,  notably,  a  paper  in  Proceedings  of  the  National  Academy  of  Sciences  (Gopalan  et  al.,  2013) 
on  analyzing  massive  social  networks. 

Finally,  I  point  out  “Build,  Compute  Critique,  Repeat:  Data  Analysis  with  Latent  Variable  Models”  (Annual 
Review  of  Statistics,  2014).  This  is  a  review  article  that  outlines  the  full  perspective  of  modern  applied 
probabilistic  modeling,  including  inference,  model  checking,  and  applications. 

Overall,  this  project  was  a  success.  Between  2011  and  2014,  my  group  has  significantly  pushed  the  needle  on 
modern  Bayesian  machine  learning.  We  have  developed  new  and  impactful  algorithms,  stretched  its  scope  to 
new  applications,  and  further  developed  the  craft  of  iterative  criticism  and  model-building. 
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